Compression By Induction of Hierarchical Grammars

نویسندگان

  • Craig G. Nevill-Manning
  • Ian H. Witten
  • David Maulsby
چکیده

Adaptive compression methods build models of symbol sequences. In many areas of computer science, models of sequences are constructed for their explanatory value. In contrast, data compression schemes use models that are opaque in that they do not provide descriptions of the sequence that can be understood or applied in other domains. Statistical methods that compress text well invariably generate large models that are not so much a structural description of the sequence as a record of frequencies of short substrings. Macro models replace repeated text by references to earlier occurrences and generally work within a small moving window of symbols so that any implicit model is transient. In both cases the model is flat and does not build up abstractions by combining references into higher level phrases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

(0, 1)-Matrix-Vector Products via Compression by Induction of Hierarchical Grammars

We demonstrate a method for reducing the number of arithmetic operations within a (0, 1)matrix vector product. We employ an algorithm, SEQUITUR, developed for lossless text compression, which generates a context free grammar derived from an inherent hierarchy of repeated sequences. In this context, the sequences are composed of bit patterns for a set of adjacent columns. This grammar will repre...

متن کامل

Unsupervised Grammar Induction in a Framework of Information Compression by Multiple Alignment, Unification and Search

This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICMAUS), is founded on principles of Minimum Length Encoding pioneered by Solomonoff and others. Most of the p...

متن کامل

Compression and Explanation Using Hierarchical Grammars

This paper describes an algorithm, called SEQUITUR, that identifies hierarchical structure in sequences of discrete symbols and uses that information for compression. On many practical sequences it performs well at both compression and structural inference, producing comprehensible descriptions of sequence structure in the form of grammar rules. The algorithm can be stated concisely in the form...

متن کامل

Conciseness of Associative Language Descriptions

Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...

متن کامل

Associative Definition of Programming Languages1

Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994